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Unraveling the Distribution and Evolution of miR156-targeted 
SPLs in Plants by Phylogenetic Analysis 
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Abstract; Squamosa promoter-binding protein-like genes (SPLs) are critical during plant development and mostly 
regulated by miR156. However, little is known about phylogenetic distribution and evolutionary patterns of miR156- 
targeted SPLs. In this study, 183 SPLs from nine genome-sequenced species representing algae, bryophytes, lyco- 
phyte, monocots, and eudicots were computationally analyzed. Our results showed that miR156 responsive elements 
(MREs) on SPLs were present in land plants but absent from unicellular green algae. Phylogenetic analysis revealed 
that miR156-targeted SPLs only distributed in group II not group I of land plants, suggesting they originated from a 
common ancestor. In addition, group II were further divided into seven subgroups ( IHa-Ilg) and miR156-targeted 
SPLs distributed in some specific members of SPLs from six subgroups except subgroup Id. Such distribution pattern 
was well elucidated by gene structure evolution of miR156-targeted SPLs based on the correlation of phylogenetic 
classification and gene structure. They could suffer from the exon loss events combined with MREs loss during evolu- 
tion. Moreover, gene duplication contributed to the abundance of miR156-targeted SPLs, which had significantly in- 
creased after angiosperms and lower plants split. With Arabidopsis as the model species, we found segmental and 


tandem gene duplications predominated during miR156-targeted SPLs expansion. Taken together, these results pro- 
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vide better insights in understanding the function diversity and evolution of miR156-targeted SPLs in plants. 


Key words; Phylogenetic analysis; Gene duplication; Gene structure; MicroRNA; Transcription factor 


Abbreviations: CDS, coding sequence; CR, Chlamydomonas reinhardtii; DBD, DNA-binding domain; miRNAs, 


microRNAs; MITEs, miniature inverted repeat transposable elements; ML, maximum-likelihood; MREs, miR156 


responsive elements; NJ, Neighbor-Joining; NLS, nuclear localization signal; SBP, Squamosa promoter-Binding 


Protein; SPL, Squamosa promoter-binding protein-like gene 


Squamosa promoter-binding protein-like genes 
(SPLs ) encode plant-specific transcription factors 
(TFs) that share a highly conserved Squamosa pro- 
moter Binding Protein (SBP) domain and recognize 
similar target DNA sequences. This SBP-domain 
spans 79 amino acids residues and features a se- 
quence-specific DNA-binding domain (DBD). The 
DBD contains two zinc-binding sites assembled as 
Cys-Cys-His-Cys (Cys2HisCys) and Cys-Cys-Cys-His 
(Cys;His) (Yamasaki et al., 2004) and a highly 
conserved bipartite nuclear localization signal (NLS) 
in C-terminal (Birkenbihl et al., 2005). It has been 
proved that the SBP-domain specifically binds to se- 
quences containing a palindromic GTAC core motif 
(Birkenbihl et al. , 2005; Cardon et al., 1997). 

Elevated studies have described the functions of 
members of SBP-box genes in different plant orga- 
nisms through analysis of either their loss-of-function 
or gain-of-function mutants. It is known that SPLs 
are critical in diverse biological processes, including 
seed germination and seedling development ( Martin 
et al., 2010), leaf development (Moreno et al., 
1997) , phase transition ( Gandikota et al., 2007; 
Wang et al., 2009; Wu and Poethig, 2006) , fruit 
ripening (Manning et al., 2006) , copper homeosta- 
sis (Kropat et al., 2005; Yamasaki et al., 2009 ) 
and grain yield (Jiao et al., 2010; Miura et al., 
2010). In fact, it is difficult to point out the exact 
functions of SPL transcription factors in development 
because of their extreme genetic redundancy and the 
regulatory complexity. Recently, the variety of ele- 
gant approaches elucidated the regulatory mode of 
SPLs and miR156 at different stages of plant deve- 
lopment. Their interplay provides the paradigms for 


how these SPLs exert their functions in development. 


For example, the low-level expression of SPLs in 
miR156-overexpress mutant prolonged the juvenile 
phase in both maize (Chuck et al., 2007) and Ara- 
bidopsis (Wu and Poethig, 2006). Another case is 
the validation of miR156-miR172 gene regulation 
cascades regulated by SPL9 from Arabidopsis juve- 
nile to adult phase transition. In this case, evidence 
has been obtained for the direct regulation of miR172b 
by SPL9, a miR156 target (Wu et al., 2009). Over 
the past few years, many researchers have been wor- 
king to reveal the functions of the miR156-regulated 
developmental programs through analyzing the spatio- 
temporal expression patterns of miR156 and its targets, 
as well as characterizing the mutations in Arabidopsis 
and maize. The regulatory functions of SPL transcrip- 
tion factors in relation to miR156 were documented in 
several reviews (Chen et al., 2010; Fornara and Coup- 
land, 2009; Nonogaki, 2010; Poethig, 2010). 

A large number of work has shown that miR156 
families and their targeted SPLs are conserved through- 
out land plants. With respect to miR156 families, the 
evolutionary study has benefited from large-scale small 
RNA sequencing/cloning project from many species, 
in particular the most ancient land plants (e. g. 
moss). The comparision of the mature miR156 se- 
quences showed that miR156 family was conserved be- 
tween core eudicots and mosses, elaborating their pres- 
ence in the earliest common ancestor of land plants. In 
addition, the targeted SPLs for miR156 families are al- 
so conserved in plants. For example, a conserved SBP 
protein target of miR156 has been cloned from the moss 
Physcomitrella patens and shown to be cleaved within 
the predicted target site ( Arazi et al., 2005). More 
recently, Guo and his colleagues have demonstrated 


that there is nearly a perfect conservation of the 
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miR156 target site in SPLs for all land plants ana- 
lyzed but not conserved in the unicellular green alga 
Chlamydomonas reinhardtii (Guo et al., 2008). 
Current evidences indicate that considerable di- 
vergence of the functions of SPLs (including miR156- 
targeted SPLs) exists in plants. For example, in 
Arabidopsis SPL3, SPLA and SPL5 appear to function 
mostly in the control of flowering time and phase 
change ( Fornara and Coupland, 2009; Wu and Po- 
ethig, 2006) , whereas SPL9 and SPLI5 have strong 
effects on leaf initiation (Schwarz et al., 2008). In 
addition, studies displayed SBP-box genes were di- 
versified during evolution by analyzing gene struc- 
tures, phylogeny, and motif elements (Guo et al., 
2008; Riese et al., 2007; Yang et al., 2008). 
Take motif elements for example, some of them was 
conserved between moss and seed plants ( Guo et 
al., 2008; Riese et al., 2007) , whereas others are 
species-specific after the split of monocotyledon and 
dicotyledon ( Yang et al., 2008). Although the pre- 
vious studies illustrated the diversity of SBP-box 
gene family in plants during evolution, they did not 
detail the evolutionary pathway of miR156-targeted 
SPLs. For example, when such a large set of impor- 
tant MREs has been established in the SPLs? Why 
some of SBP-box genes are targeted by miR156, 
while others were not. What are the differences be- 
tween miR156-targeted SPLs and non-targeted SPLs 
during evolution? These aforementioned questions in- 
trigue us to glean about the evolutionary information 
of targeted SPLs over long evolutionary timescales. 
The survey of miR156-targeted SPLs occurrence in 
plants and mapping this information onto the com- 
prehensive plant phylogeny will update our know- 
ledge on their origin and facilitate interpretation of 
evolutionary pathway and function divergence among 
distantly related plant species. In addition, the com- 
plete sequencing of numerous plant species genome 
(in particular, the green algae and moss) promotes 
the comprehensive collection of information on the 
SBP-box genes. Currently, two integrative transcrip- 


tion factor libraries exploited are available online, 


which documented SBP-box TF family and other TF 
families in lower and higher plants (He et al., 2010; 
Perez-Rodriguez et al., 2009). These resources allow 
us to perform extensive phylogenetic analyses for 
miR156-targeted SPLs and explore evolutionary histo- 
ry based on their phylogenetic distribution. 


Materials and methods 
SPL sequences collection 

The protein, domain and mRNA sequences of 
SBP-box genes were downloaded from PlnTFDB v3.0 
( Riano-Pachon et al., 2007). The collected se- 
quences included nine genome-sequenced species; 
one alga ( Chlamydomonas reinhardtii), one moss 
( Physcomitrella patens) , one lycophyte ( Selaginella 
moellendorffii), three eudicots ( Arabidopsis thali- 
ana, Populus trichocarpa and Vitis vinifera) and 
three monocots ( Oryza sativa subsp. japonica, Zea 
mays and Sorghum bicolor) (Table 1 and Supple- 
mentary Table 1). Sequence data for gene and CDS 
(Coding Sequence ) were downloaded from DOE 
Joint Genome Institute (JGI) ( http://www. jgi. 





doe. gov’) and several species genome annotation 
databases: The Arabidopsis Information Resource 


(TAIR) 10 genome release (http://www. arabidop- 





sis. org/) , TIGR rice genome annotation database 


release 6.1 (http://rice. plantbiology. msu. edu/) , 





and maize sequence genome database release 5b. 60 


( http://www. maizesequence. org/index. html ). 





The transcript sequences of grape SBP-box genes 
were conducted blast analysis to obtain their corre- 
sponding gene sequences and CDS in Phytogome v6.0 
(http://www. phytozome. net). Some obsolete locus 





identifiers and new added SPLs uniformly adopted 
the locus identifiers from JGI or species genome an- 
notation database. A total of 183 SBP-box genes 
were obtained and the complete catalog of them is a- 
vailable in Supplementary Table 1, including the ob- 
solete or new added genes. 

Prediction of miR156 responsive elements within 
SPL genes 


Mature miR156 sequences of eight species (not 
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found in green algae) were downloaded from the 
miRBase database ( Release 17.0) (Kozomara and 
Griffiths-Jones, 2011). They were used to predict 
SPL targets by using miRU with default settings 
(Zhang, 2005). To further increase the stringency 
of predicted miR156 targets, we used empirical pa- 
rameters as a second filter (Schwab et al., 2005). 
These algorithms were designed to reflect molecular 
target recognition mechanisms that are assumed to 
apply to miRNA target recognition. The empirical 
parameters used in this study were as follows: no 
mismatch at positions 10 and 11; no more than one 
mismatch at positions 2-12; no more than two con- 
secutive mismatches downstream of position 13; the 
total number of mismatch no more than 3 with minor 
modification. By applying the above rules, our anal- 
ysis led to the prediction of 61 SPL genes as the pu- 
tative targets for miR156 family (Table 1 and Ap- 
pendix 1). 
Sequence alignment and phylogenetic analysis 
Protein and domain sequences from the above 
nine species were initially aligned using CLUSTALX 
(Thompson et al., 1997) and manually adjusted in 
Se-Al software v2.0 (http://evolve. zoo. ac. uk ) 





whenever necessary. Only the SBP-box domains 
were used for the phylogenetic analysis, because the 
protein sequences showed no consensus sequences 
when SBP-box domains were masked. We used 
PHYLIP (v3.6) (http://www. bioinformatics. uth- 
scsa. edu/www/phylip/) to construct the neighbor- 








joining (NJ) and maximum-likelihood (ML) tree 
following Guo’s method (Guo et al., 2008). Sup- 
port values were assessed using 1000 replicate boot- 
strap tests, only the clades with bootstrap value 
higher than 50 were shown. 
Intron/exon structure and sequence logo analysis 
The CDS and genomic sequences of SPL genes 
were used to derive intron/exon structure with Gene 
Structure Display Server (GSDS, http ://gsds. cbi. 
pku. edu. cn/). The sequence logos were performed 
using the WebLogo at the URL; http://weblogo. 
berkeley. edu/logo. cgi. 








Chromosomal distribution and duplication analysis 

The location of SBP-box genes on chromosomes in 
Arabidopsis was mapped by the Chromosome Map Tool 
at TAIR ( hitp://arabidopsis. org/jsp/ Chromosome- 
Map/tool. jsp webcite ). Gene duplications and their 








presence on duplicated chromosomal segments were in- 
vestigated using “Paralogous in Arabidopsis thaliana” 
with the default parameters set (Blanc et al., 2003 ; 
Vision et al., 2000; Wang et al., 2008). Only the 
blocks containing SBP-box genes were retained, and 
then genes detected were mapped on the chromo- 


somes and linked to each other by lines manually. 


Results and discussion 
MREs are specific to SPLs of land plant lineages 
A total of 183 SPL genes were obtained from 
nine species, which represented the main lineages of 
the green plants; green alga (C. reinhardtii) , moss 
(P. patens ) , lycophyte ( S. moellendorffii) , mono- 
cots (rice, sorghum and maize) and eudicots ( Ara- 
bidopsis, grape and poplar) (Table 1 and Appendix 
1). These genomes have been fully sequenced and 
all the putative members of the SBP-box gene family 
have been identified according to their domain struc- 
ture ( Perez-Rodriguez et al., 2009). For example, 
the SPLs from green algae had reached to 23, which 
was 3 times more than those of the previous report 
(Guo et al., 2008) when the genome sequence has 
not been released (Table 1). Therefore, these spe- 
cies open the possibility for a comprehensive analysis 


of MREs within SPL genes. High-confidence prediction 


Table 1 The number of SBP-box genes in nine representative plants 





Lineage Organism SPLs Targeted SPLs 
Alga Chlamydomonas reinhardtii 23 0 
Moss Physcomitrella patens 14 5 
Lycophyte Selaginella moellendorffii 11 0 
Oryza sativa 19 12 
Monocots Sorghum bicolor 19 9 
Zea mays 33 13 
Arabidopsis thaliana 17 11 
Eudicots Populus trichocarpa 29 6 
Vitis vinifera 18 5 
Total 183 61 
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of miR156 targets were performed by miRU based on 
sequence complementarity and evolution conservation 
(Zhang, 2005). To further increase the stringency of 
predicted miR156 targets, we used empirical parame- 
ters as a second filter (Schwab et al., 2005). These 
parameters considered more algorithm features in- 
stead only sequence complementarity and conserva- 
tion (see Materials and Methods in detail). Finally , 
61 out of 183 SPL genes were the putative target 
genes for miR156 family with high probability ( Ta- 
ble 1 and Appendix 1). We roughly estimated the 
accuracy of putative targets by seeking confirmations 
for experimental data. For example, all the putative 
targets in Arabidopsis and rice have been experimen- 
tally validated by several independent laboratories 
(Li et al., 2010; Xie et al., 2006; Xing et al., 
2010), indicating that our prediction of conserved 
miR156 targets was highly accurate. The predicted 
results showed that MREs were not found in green 
algae and Selaginella moellendorffii, while they were 
observed in other seven land plants. With respect to 
green algae, we noticed that no miR156 homologous 
have been identified after publishing its genome se- 
quence (Worden et al., 2009). To further affirm 
our prediction, we used the members of miR156 
from all other land plants to predict MREs within 
SPLs of green algae. The result indicated that there 
were still no MREs found in SPLs of green algae. 
Meanwhile, previous studies indicated that no uni- 
versal miRNA 
miR156-regulatory pathway ) were existed among 


regulatory pathways ( including 
land plants and green algae ( Guo et al. , 2008; Mol- 
nar et al., 2007). Therefore, we can conclude that 
the miR156 targets were indeed not appeared in uni- 
cellular green algae. On the contrary, the miR156 
homologous and SPL genes were indentified in Sela- 
ginella moellendorffii, but the MREs were not pre- 
dicted in our analysis. One possible explanation is 
the interactive sites miR156 and SPLs had more than 
four mismatches and did not serve as MREs by using 
prediction criteria in this study (data not shown). 


All together, these above analyses concluded that 


miR156-regulatory pathway had arisen after the di- 
vergence multicellular land plants and unicellular 
green algae. 
Phylogenetic distribution of miR156 targeted-SPLs 
in land plants 

To understand the evolution history of these tar- 
geted SPLs, we constructed an unrooted neighbor- 
joining (NJ) tree for all the SPLs of land plants 
(Fig. 1). In addition, we obtained another tree with 
similar topology using maximum-likelihood ( ML ) 
method (data not shown). As shown in Fig. 1, all 
SBP-box gene sequences of land plants were resolved 
into two major clades (group I and group II). Ten 
SPL genes with the SBP-domain of four Cys residues 
from moss, lycophyte and several flowering plants 
formed group I (Fig. 1 and Fig. 2; A). A large 
number of SPLs from each land plant lineage were 
clustered into group Il, where they were further di- 
vided into seven subgroups (Ila-IIg). The group H 
had the SBP-domain with a Cys, His motif, which 
was different from group I but same to CR group 
(Fig.2: B, C). Based on the phylogenetic data, 
we speculated that the last common ancestor of land 
plants had at least two classes of SBP-box genes. In- 
spection of miR156 targets displayed distribution in 
group H but not in group I (Fig. 1) , indicating that 
they originated from a common ancestor and had 
arisen after the divergence of group I and group II. 

Similarly , an uneven distribution of the miR156 
targets on different subgroups of group II was also 
apparent (Fig. 1 and Table 2). At first, with the 
exception of subgroup IId, targeted SPLs were wide- 
ly existed in remaining six subgroups. It was obvious 
that these targeted SPLs restrictedly distributed in 
some members of six branches (Fig. 1). More strik- 
ingly, we found a lineage-specific distribution pat- 
tern of targeted SPLs. For example, two targeted 
SPLs in subgroup Ile were both from moss, whereas 
subgroup Ia, IIb, If only contained angiosperm tar- 
geted SPLs (Fig. 1 and Table 2). A similar distri- 
bution pattern of miR172 binding sites, a down- 


stream regulatory factor of miR156 targets, that were 
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only restricted to some member sequences of the eu- 
AP2 group of the AP2-like family was also reported 
(Kim et al., 2006). The restricted distribution of 
miR156 targets suggested their involvement in targe- 
ting of selected SBP-box genes in distinct lineages 
and therefore in the regulation of particular functions 
of lineage-specific characters. Secondly, the abun- 
dance of miR156-targeted SPLs in angiosperm linea- 
ges was largely different on evolutionary timescale. If 
angiosperm SPLs clustered with lower plant SPLs (e. 
g. moss SPLs), suggesting that these angiosperm 
SPLs were early evolved and vice verse. Among seven 
subgroups, angiosperm SPLs and lower plants SPLs 
shared four subgroups (Hc, Hd, Ie and If) , where 





20 out of 70 SPLs were targeted by miR156. Howev- 
er, the remaining three subgroups (Ila, IIb and Hg) 
only possessed angiosperm SPL genes, where 35 tar- 
geted SPLs were detected among 55 SPLs (Table 2). 
By comparing the abundance of targeted SPLs in angi- 
osperms across subgroups over different evolutionary 
timescales, we concluded that targeted SPLs of angio- 
sperms mainly increased after angiosperms and lower 
plants split. More importantly, we found a majority 
of targeted SPLs enriched in gene pairs among diffe- 
rent angiosperm lineages of group II. Therefore, it 
implied that gene duplication lead to the increasing 
of miR156 targets in angiosperms after the diver- 


gence of angiosperms from lower plants. 


IIb 


157,159 


Ile 


Fig. 1 Phylogenetic tree of SBP-box genes across different species. SBP-box domain sequences of nine plant species were analyzed; 


an unrooted tree was constructed using Neighbour-Joining (NJ) method, bootstrap 1000 replicates. ===; green algae, = ===; 


MOSS, seeeenaes : lycophyte, = monocots, mum; eudicots. Note: The digits inside the branches indicate the support values and 


those outside the branches indicate the number of miR156-targeted SPLs (see Appendix 1 in detail) 
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Fig.2 Sequence logos of the SBP-box domain. The sequence logos of SBP-box domain of group I (A), group N (B), 


CR group (C). The overall height of letter at each position represents the degree conservation. The two 


conserved zinc finger structures and NLS are indicated in three different groups 


Table 2 The distribution of miR156-targeted SPLs 


on each subgroup of group I 





Subgroup No. of SPLs Lower plants? Higher plants? 
Subgroup Ha 22 0 (0) 22 (15) 
Subgroup Hb 11 0 (0) 11 (11) 
Subgroup IIc 16 7 (3) 9 (2) 
Subgroup Id 27 4 (0) 23 (0) 
Subgroup Ile 24 10 (2) 14 (0) 
Subgroup IIf 25 1 (0) 24 (18) 
Subgroup Ilg 25 0 (0) 25 (10) 
Total 150 23 (5) 128 (56) 


“and ” The digit in bracket indicates the number of miR156-targeted 


SPLs in lower plants and higher plants, respectively 


Gene structure analysis of miR156-targeted SPLs 

The phylogenetic distribution patterns of miR156 
targets could shed light on the evolutionary pathway 
that shaped their history. To investigate this possi- 
bility, we analyzed and compared the gene structure 
between the targeted SPLs and non-targeted SPLs be- 
cause gene structure is an important indicator to 


classify the different genes. As such, we reconstruc- 


ted an NJ tree based on 36 SBP-box proteins from 
eudicots (Arabidopsis) and monocots ( Oryza sativa ) 
and carried out intron/exon structure analysis ( Fig. 
3: A). It is notable that the intron/exon structure 
correlated with the classification of SPL genes based 
on the phylogenetic analysis. For example, SPL 
genes in subgroup Ild and group I had 11 and 10 
exons respectively, while all SPL genes of subgroup 
Ila and Ilf had four and three exons respectively 
(Fig.3: B). The apparent correlation between in- 
tron/exon structures and the classes of SPL genes 
was probably due to the expansion of SPLs in each 
clade by ancient and recent duplication events. The 
alternative possibility was that the SPL genes intron/ 
exon structures could have certain level of stability at 
the late stages of evolution of angiosperms. There- 
fore, this good correlation between phylogenetic rela- 
tionship and gene structure was contributed to under- 
standing the evolution of gene structure of targeted 


SPLs and interpreting their distribution patterns. 
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Fig.3 Intron/exon structure in conjunction with phylogenetic tree for the Arabidopsis and rice SBP-box proteins and the structure of targeted 


SPLs in moss. (A) Schematic diagram of phylogenetic tree reconstructed from a complete alignment of 17 Arabidopsis and 19 rice SBP-box pro- 
teins. (B) Intron/exon structures of SBP-box genes of Arabidopsis and rice. (C) Intron/exon structures of targeted SPLs in moss. The genes 
marked the star in the phylogenetic tree were regulated by miR156. As shown in the legend, blank boxes stand for CDS, horizontal lines stand 
for the introns, black boxes are UTR regions, and vertical bars indicate the position of MREs. The phylogenetic relationships of groups and sub- 


groups were presented in Figure 1. Four SBP-box proteins not assigned the clades corresponding to the domain tree were bold and italic 
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As shown in Fig. 3B, the coding sequences of 
SBP-box genes were interrupted by a variable num- 
ber of exons ranging from two to eleven. The targe- 
ted SPL genes from different subgroups had similar 
structure features with two to four exons. In addi- 
tion, all the targets belonged to a monophyletic 
clade. By contrast, non-targeted SPLs were the most 
divergent in gene structures and could be classified 
into two classes according to the phylogenetic posi- 
tion. One class of non-targeted SPLs lied outside of 
the monophyletic clade that the targeted SPLs be- 
longed to, such as the genes from subgroup Ild. 
This class of non-targeted SPLs contained more ex- 
ons (at least ten exons) than the targeted SPLs but 
was similar to SPL genes of group I. Therefore, we 
speculate that the targeted SPLs might suffer from 
exon loss events during evolution. Furthermore, 
moss, an early-branching species of land plants, 
could provide a window into the early evolution of 
targeted SPLs in land plants. Fig. 3C shows that a 
portion of targeted SPLs in moss possessed the 
ancient gene structure with exons ranging from 6 to 
13, such as targeted SPLs 69445 , 93998 and 168927. 
These exons of miR156-targeted SPLs might be lost 
at different dimensions. At first, previous studies 
suggested that SBP-domains lied in the first two ex- 
ons and possessed the conserved intron position 
(Guo et al., 2008; Xie et al., 2006). The authors 
found that part of moss SBP-box genes had some ex- 
ons at the upstream of SBP-domain, providing the 
evidence of exons loss from 5’ -end flanking of SBP- 
domain. In our study, we found the MREs within 
the above three targeted SPLs in moss lied in exon 
regions excluding the last ones. By contrast, most 
MREs were located in the last exon and some of 
them began to reside in 3’ UTR regions. These re- 
sults indicated the exon of targeted SPLs might also 
be lost from the 3’-end regions. A mechanistic ex- 
planation for these scenarios suggested that the exons 
might be lost from the 3’ -portion of SPLs because of 
homologous recombination of their cDNAs ( Derr, 


1998; Mourier and Jeffares, 2003 ). 


The second class of non-targeted SPLs (e. g. 
the genes from subgroup Ile) had the similar gene 
structure to targeted SPLs and also possessed no 
more than four exons (Fig. 3: A, B). However, 
they embedded within the same monophyletic clade 
as targeted SPLs. One impossible explanation was 
that these non-targeted SPLs might be originally tar- 
geted by miR156 followed by the loss of miR156 
binding sites. To test this hypothesis, we further an- 
alyzed the phylogenetic relationship across targeted 
SPLs because the paraphyly of miRNA targets on the 
phylogenetic tree may account for MRE loss. In- 
deed, five targeted SPL genes (e. g. LOC_0s08239890 
and LOC_0s09¢31438 from subgroup Hb) and one 
target gene (LOC_0Os04246580) from subgroup Hg 
formed paraphyletic branch (Fig.3: A). However, 
LOC_0s04 246580 and non-targeted SPL genes (e. 
g. LOC_Os02208070 and LOC _0s04256170 ) in 
subgroup Ile clustered each other in a branch. Such 
distribution pattern of MREs suggested a loss of 
miR156 targeting or alternatively a gain of miR156 
targeting in closely related genes. This could be evi- 
dence of loss of a MRE after duplication event, be- 
cause the latter scenario was less likely unless 
recombinational events or gene conversion events 
were involved. Overall, these analyses revealed tar- 
geted SPLs mainly experienced the exon loss events 
following by some MREs loss during evolution. 
Gene duplication of miR156-targeted SPLs in 
Arabidopsis 

Apart from the relatedness of gene structure, 
gene duplication was also an important factor to in- 
fluence the distribution pattern of targeted SPLs. As 
shown in Fig. 3A, more than an half of SBP-box 
genes constituted gene pairs, such as 12 paralogous 
gene pairs and 2 orthologous gene pairs indentified 
based on protein analysis. We observed the paralo- 
gous gene pairs in each lineage were mainly regula- 
ted by miR156. For example, 5 out of 7 paralogous 
gene pairs were miR156 targets in Arabidopsis. This 
result suggested that the duplication events in re- 


spective lineage were the main resource of targeted 
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SPLs and influenced the abundance of them on phy- 
logenetic tree. Therefore, it is important to study the 
duplication mechanisms to interpret the distribution 
pattern. In our study, we focused on Arabidopsis. 
This species genome has undergone at least two 
large-scale segmental duplication events, which had 
great impact on amplification of members of a gene 
family (including targeted SPLs) in the genome. 
One was the recent polyploidy duplication, which 
occurred before Arabidopsis and Brassica rapa split 
about 24-40 Mya. The other was an older duplica- 
tion between chromosomal blocks after the diver- 
gence of monocot-eudicot around 120 Mya (Blanc et 
al. , 2003; Bowers et al., 2003; Vision et al., 2000). 
Considering these factors, we investigated SBP-box 
family gene duplication and distribution on all five 
Arabidopsis chromosomes. The recent segmental 
polyploidy duplicated blocks were explored by the 


“Paralogons in Arabidopsis thaliana” search engine 
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(Wang et al. , 2008). 

Fig. 4 showed that there were three pairs of re- 
cent duplicated blocks containing SBP-box genes. 
Both regions on chromosome 1 containing AT1G20980 
and AT1G76580 were duplicated segmental block 
pairs. The region containing AT1G53160 on chro- 
mosome 1 and the region containing AT3G15270 on 
chromosome 3 were duplicated segmental block 
pairs. The regions on chromosome 2 and on chromo- 
some 3 comprised two duplicated segmental block 
pairs, such as AT2G42200 and AT3G57920, AT2G 
47070 and AT3G60030. Among four segmental 
pairs, there were two duplicated gene pairs targeted 
by miR156. All of these segmentally duplicated 
genes were also found to be paralogous in the phylo- 
genetic analysis as shown in Fig. 1. The results indi- 
cated that segmental duplication was a major way for 
SBP-box gene birth ( in particular the targeted 
SPLs) for Arabidopsis. 


> 
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At5g18830 
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= At5g50570 
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Fig.4 Chromosomal distribution and duplication events for Arabidopsis SBP-box genes. Diagram of five chromosomes 


of Arabidopsis was depicted, 17 SBP-box family genes were distributed on these chromosomes. Only the duplicated 


regions containing SBP-box genes are shown. Black lines connect corresponding sister gene pairs in duplicated blocks 


(Blank boxes). AT1G27360 and AT1G27370, AT5G50570 and AT1G50670 are clustered as tandem repeats 
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Besides, two tandem duplication events were 
also found on chromosome 1 and 5, respectively. 
AT1G27360 and AT1G27370 were two genes with 
high similarity of DNA sequence and only 1 kb dis- 
tance on the chromosome 1. The other two genes, 
AT5G50570 and ATIG50670 had almost consensus 
similarity, although they depart from about 31 kb 
distance. The two gene pairs were targeted by miR156. 
All together, large-scale segmental duplication and 
tandem duplication events in Arabidopsis increased 
the abundance of targeted SPLs and appeared to 
have exclusively contributed to the current comple- 


xes of the targeted SPLs and their gene family. 
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The catalog of 183 SBP-box genes in nine species 


Target site 





No. Species Locus ID position | NO Species Locus ID at 
1 Arabidopsis thaliana AT2G33810 3° UTR || 14 AT2G47070 
2 (Arabidopsis) AT3G15270 3° UTR || 15 AT3G60030 
3 AT1G27360 CDS 16 AT5G18830 
4 AT1G27370 CDS 17 AT1G76580 
5 AT1G53160 CDS 18 Chlaymydomonas reinhardtii 93505 
6 ATIG69170 CDS 19 (green algae) 96716 
7 AT2G42200 CDS 20 101247 
8 AT3G57920 CDS 21 101657 
9 AT5G43270 CDS 22 105679 
10 AT5G50570 CDS 23 106739 
11 AT5G50670 CDS 24 108149 
12 ATI.G02065 25 108444 
13 ATIG20980 26 115124 
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Continued 
No: Species Locus ID peated No. Species Locus ID aa 
27 115254 72 90199 
28 118761 73 97909 
29 120852 74 Populus trichocarpa 743829 3° UTR 
30 121606 75 (Poplar) 570289 CDS 
31 121939 16 576281 CDS 
32 170753 17 733659 CDS 
33 171833 78 755123 CDS 
34 186869 79 769914 CDS 
35 195928 80 179090 
36 288620 81 179183 
37 290479 82 197948 
38 291579 83 216243 
39 405089 84 226094 
40 414856 85 235814 
41 Orya sativa subsp. japonica  LOC_0s04g46580 3 UTR 86 245406 
42. (rice) LOC_Os01 269830 cbs 87 263406 
43 LOC_0s02204680 cbs 88 267542 
44 LOC_0s02¢07780 cbs 89 274234 
45 LOC_0306 245310 cbs 90 286316 
46 LOC_0306 249010 cbs 91 286321 
47 LOC 0807232170 CDS 92 298307 
48 LOC_0s08239890 CDS 93 409154 
49 LOC_0s08g41940 CDS 94 412443 
50 LOC_0s09231438 cbs 95 415293 
51 LOC_0s09 932944 cbs 96 560022 
52 LOC_Os! 1230370 cbs 97 647067 
53 LOC_Os01 218850 98 656549 
54 LOC_0s02208070 99 656553 
55 LOC_0s03 261760 100 798319 
56 LOC_0s04256170 101 832886 
57 LOC_0s05 233810 102 833398 
58 LOC_0306 244860 103: Sorghum bicolor 4160487 CDS 
59 LOC_0s08 240260 104 (sorghum) 4160700 CDS 
60 Physcomitrella patens 69445 cbs 105 4748489 CDS 
Gi Caines) 74968 CDS 106 5003160 CDS 
62 93998 CDS 107 5003651 CDS 
63 168927 CDS 108 5042307 CDS 
64 168928 CDS 109 5054656 CDS 
65 8925 110 5059026 CDS 
66 19787 111 5062217 CDS 
67 19788 112 4112095 
68 29422 113 4163165 
69 29851 114 4785561 
70 74970 115 4814910 
71 83876 116 4862557 
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Continued 

No. Species Locus ID pes No. Species Locus ID paa 
17 4974192 151 Zea mays GRMZM2G163813 3’ UTR 
118 4985600 152 (Maize) GRMZM2G040785 CDS 
119 5047795 153 GRMZM2G061734 CDS 
120 5059084 154 GRMZM2G065451 CDS 
121 5060486 155 GRMZM2G097275 CDS 
122 Selaginella moellendorffii 17777 156 GRMZM2G101511 CDS 
123° (Lycophyte) 28598 157 GRMZM2G126018 CDS 
124 28626 158 GRMZM2G148467 CDS 
125 28629 159 GRMZM2G307588 CDS 
126 28630 160 GRMZM2G390470 CDS 
127 28635 161 GRMZM2G414805 CDS 
128 49859 162 GRMZM2G450128 CDS 
129 59543 163 GRMZM2G460544 CDS 
130 59991 164 GRMZM2G024760 

131 79699 165 GRMZM2G036297 

132 437670 166 GRMZM2G058588 

133 Vitis vinifera GSVIVT00002776001 CDS 167 GRMZM2G067624 

134 (grape) GSVIVT00017032001 CDS 168 GRMZM2G080065 

135 GSVIVT00017953001 CDS 169 GRMZM2G081127 

136 GSVIVT00019157001 CDS 170 GRMZM2G098557 

137 GSVIVT00025360001 CDS 171 GRMZM2G101499 

138 GSVIVT00002800001 172 GRMZM2G102758 

139 GSVIVT00002959001 173 GRMZM2G106798 

140 GSVIVT00003071001 174 GRMZM2G109354 

141 GSVIVT00004625001 175 GRMZM2G113779 

142 GSVIVT00008511001 176 GRMZM2G126827 

143 GSVIVT00018616001 177 GRMZM2G133279 

144 GSVIVT00019158001 178 GRMZM2G133646 

145 GSVIVT00019711001 179 GRMZM2G138421 

146 GSVIVT00019851001 180 GRMZM2G156621 

147 GSVIVT00027720001 181 GRMZM2G156756 

148 GSVIVT00028195001 182 GRMZM2G168229 

149 GSVIVT00030009001 183 GRMZM2G169270 

150 GSVIVT00037879001 


Data resource; Green alge, Moss, Lycophyte, Poplar and Sorghum (http://www. jgi. doe. gov/genome-projects/ , the release version is 4.0, 1.1, 





1.0 and 1.0 for the first four species, respectively) ; Grape (http://www. phytozome. net, v6.0) ; Arabidopsis (http://www. arabidopsis. org/ , 








release 10) ; Rice (http://rice. plantbiology. msu. edu/, v6.1); Maize (http://www. maizesequence. org/index. html, release 5b. 60). The 








gene sequences and CDS of SBP-box genes of each species were downloaded from the above databases. All the transcripts, protein and domain se- 


quences were downloaded from PInTFDB (v3.0) (http:;//plntfdb. bio. uni-potsdam. de/v3.0/). 
Noting: the 10 obsolete locus identifiers of SPL genes: GRMZM2G006850, GRMZM2G015007, GRMZM2G020881, GRMZM2G075639 , 
GRMZM2G090058 , GRMZM2G108162, GRMZM2G114243 , GRMZM2G145615, GRMZM2G154844 and GRMZM2G160932. Another 6 new 
added SPL genes; GRMZM2G307588 , GRMZM2G390470 , GRMZM2G414805 , GRMZM2G450128 , GRMZM2.G460544 and AT1G76580. All the 





altered genes were from maize except for AT1G76580 (from Arabidopsis ) . 


